Choose a bin size and a center value, e.g. one hour bins centered at the integers would be denoted as \((5.5, 6.5]\), \((6.5, 7.5]\), \((7.5, 8.5]\), etc. Bins must be non-overlapping, and there should be enough bins to completely cover the data.
Assign each runner to a bin, e.g. 12.98 goes into the \((12, 13]\) bin and 12.0 goes in to the \((11, 12]\) bin
Plot bars for each bin, with the height of the bar corresponding to the number of runners in that bin
ggplot(ultrarunning) +geom_histogram(aes(x = pb100k_dec), binwidth =1, center =10, fill ="grey", color ="black") +labs(x ="Personal best time (hours)",y ="Count") +theme(text =element_text(size =24))
Bin width of 10 hours – too large
Bin width of 3 minutes – too small
Boxplots
Find the five-number summary: minimum, lower quartile, median, upper quartile, maximum
ggplot(ultrarunning) +geom_boxplot(aes(x = pb100k_dec, y =1)) +labs(x ="Personal best time (hours)") +theme(text =element_text(size =24)) +scale_y_continuous(breaks =NULL, name =NULL, limits =c(0, 2)) +theme(text =element_text(size =24))
Example: Hodgkin Lymphoma
Cancer of the lymphatic system
Occurs in most often in young adults (age 20-29) and elderly (75-84)
HL age of diagnosis in UK females
Potentially misleading conclusion when looking at boxplot alone
Boxplots do not show multimodality
Barcharts
# Create a new variable to store the surface typeultrarunning <- ultrarunning %>%mutate(pb_surface_name =case_when( pb_surface ==1~"trail", pb_surface ==2~"track", pb_surface ==3~"road", pb_surface ==4~"mix of all three" ))
Samtleben, E. (2023) Ultrarunning dataset. Teaching of Statistics in the Health Sciences Resource Portal, Available at https://www.causeweb.org/tshs/ultra-running/.